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Abstract 

Vertex similarity and graph matching are two significant 
problems with numerous important applications in different 
scientific fields, such as data mining, social network analysis, 
computer vision, biology, chemistry and many more. A 
key problem towards the design of successful algorithms is 
finding a good definition of similarity. 

In this work we provide novel perspectives on both prob- 
lems. Inspired by the concept of archetypal analysis intro- 
duced by Cutler and Breiman [24], we propose for the vertex 
similarity problem a geometric optimization problem which 
allows us to express each vertex uniquely as a convex com- 
bination of few extreme types of vertices. We use this map- 
ping of vertices to vectors of mixture coefficients to define 
vertex similarity. Furthermore, our method has the advan- 
tage of supporting efficiently several types of queries such 
as "which other vertices are most similar to this vertex?" by 
the use of the appropriate data structures. With respect to 
the graph matching problem between two graphs G, H, we 
propose the generalized condition number k(Lq, Lh) of the 
Laplacian matrix representations of G, H -a quantity widely 
used in numerical analysis- as a measure of graph similarity. 
We show that this objective has a solid theoretical basis and 
propose a deterministic and a randomized graph alignment 
algorithm. 

We validate our algorithms on both synthetic and real 
data. We present promising and interesting findings in our 
experimental results. Furthermore, we discuss in detail 
several aspects of our work and propose several research 
directions. 

1 Introduction 

Similarity is a significant concept perceived since ancient 
times, see, e.g., the Nicomachean Ethics by Aristotle Q. It 
has been an object of study of various scientific fields, rang- 
ing from psychology and philosophy to chemistry and com- 
puter science. It is a major philosophical problem to define 
similarity, if any widely acceptable definition exists. Even 
in the context of this work which is graph and vertex sim- 
ilarity, a major step towards a successful quantification of 



similarity between two vertices of a graph or two graphs is 
finding a good definition. Typically upon having a defini- 
tion, one can come up with several algorithms and heuris- 
tics to find similar vertices within a graph or high-quality 
graph matchings between two graphs. The applications of 
graph similarity have a broad range and are of central role 
in numerous fields. Indicatively we report applications in 
computer vision (e.g., optical character recognition, biomet- 
ric identification), chemistry (e.g., chemical compunds com- 
parison |38|), databases (e.g., query optimization |79|, struc- 
ture matching ll60ll ), biology (e.g., protein-protein interaction 
networks, phylogenetics |39|), information retrieval (e.g., 
automatic extraction of synonyms in a monolingual dictio- 
nary iFTSlO . social networks (e.g., link prediction [57|, viral 
marketing [52] ), World Wide Web (e.g., anomaly detection 
ll63l ). Furthermore as "birds of a feather flock together", un- 
derstanding whether or not two vertices (e.g., two users) of 
a network (e.g., online social network) are similar will auto- 
matically imply an improved ability to predict the emergence 
of an edge between them (e.g., online friendship). Notice 
that this improved prediction ability may be used for mali- 
cious purposes as well, e.g., privacy attacks [36 1. 

In this work we are interested in two problems, both 
closely related to the concept of similarity. On purpose, 
we state the problems abstractly, using quotes in several 
places, in order to emphasize that a major contribution of 
our work are two novel formalizations of these questions. 
The first problem is: given an undirected graph G(V, E) 
and two vertices u,v £ V, how "similar" are u and v? 
The second problem one is: given two graphs G(V,Ec), 
H(V, Eh), is there a permutation of the vertices of H that 
"reveals any similarities" between G and H? Can we find 
such a permutation? We shall call from now on the first and 
the second problem as the vertex similarity and the graph 
matching problem respectively. 

We propose two novel approaches to the aforementioned 
problems. Our approach to the first problem is geometric. 
Specifically, in order to quantify the similarity of two ver- 
tices we find an informative embedding of the graph in the 
Euclidean space and learn a simplex with minimum possi- 
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(a) Archetypes and Vertex Similarity (b) Graph Matching 

Figure 1 : Read Section[T]for the accompanyting details, (a) Minimum area 2-simplex S for an informative embedding of the 
largest component of the Netscience network G, see Table[T]for the dataset details. S allows us to express each data point as 
a unique convex combination of its extreme points and hence call two vertices of G similar if their corresponding mixture 
coefficients are close. (b)Permutahedron V of the symmetric group S4. The 3 shaded vertices of V define a hypothetic the 
set of isomorphisms between G and H. 



ble volume which encloses the data points. Fig.[TJa) shows a 
minimum area 2-simplex S for a normalized Laplacian graph 
embedding fPJl of the largest component of the Netscience 
network, see Table [T] This allows us to express each data 
point and hence each vertex of the Netscience graph, as a 
unique convex combination of the simplex vertices. Com- 
paring the mixture coefficients of two vertices allows us to 
quantify their similarity. For instance, using the Euclidean 
distance between the mixture coefficients (the smaller the 
distance is, the more similar the vertices are) we find that 
the pair 'Ravi Kumar' and 'Andrew Tomkins' is highly sim- 
ilar. Furthermore, the 3 extreme points of S correspond to 
three different archetypes, i.e., types of research. Specifi- 
cally, the three vertices of the simplex lie close to Kurths and 
Bocalletti, Barabasi and Jeong, Kumar, Raghavan, Tomkins 
and Rajagopalan which are respectively three authoritative 
groups of researchers on social networks. Our method has 
the important advantage of supporting efficiently queries of 
the type "which other vertices are most similar to this ver- 
tex?", "which are the most dissimilar vertices to this vertex?" 
etc. by the use of the appropriate data structures, e.g., see Q. 

Our approach to the graph matching problem introduces 
a novel criterion of similarity between two graphs G, H, the 
generalized condition number of their Laplacian matrix rep- 
resentations. Its choice has a solid theoretical basis. Specif- 
ically, consider Figure |TJb) which shows the permutahe- 
dron V of the symmetric group S4 with three shaded ver- 
tices which correspond to three hypothetical permutations 
which make graph H identical to graph G. Theorem |4.1| 



proves that our proposed randomized algorithm in the limit 
of A — > +00 (A > 1 is a parameter in our algorithm) con- 
verges to the uniform distribution over the shaded set, i.e., 
7T (Shaded Vertex) = 1/3, tt (Non-Shaded Vertex) = 0. 

The paper is organized as follows: Section [2] presents 
briefly related work to vertex similarity and graph matching 
and the theoretical preliminaries for our work. Sections [3] 
and [4] present a novel approach to the vertex similarity and 
the graph matching problem respectively. Section [5] shows 
an experimental validation and evaluation of our proposed 
methods. Section [6] discusses several aspects of our work 
and proposes new research directions. Finally, Section [7] 
concludes the paper. 

2 Related Work 

In this Section we present related research work and theo- 
retical preliminaries. Due to the large amount of reseach on 
the vertex similarity and the graph matching problems which 
additionally spans a wide range of scientific fields it is im- 
possible to exhaust the literature here. The interested reader 
is urged to read the bibliographies of the cited papers herein. 

2.1 Vertex Similarity According to [49 1 the problem of 
vertex similarity had received less attention compared to 
other problems such as community detection, vertex central- 
ities etc. until 2005. The same claim holds in the author's 
opinion more or less today and most of the existing research 
work is dominated by the same idea: two vertices are simi- 



lar if their neighbors are similar. To no surprise, the recur- 
sive nature of this idea leads to recursive algorithms. Before 
describing the widely used recursive algorithms it's worth 
pointing out that other measures of similarity exist: the num- 
ber of common neighbors, Jaccard's coefficient, Salton's co- 
efficient, the Adamic/Acar coefficient [3 1 etc. These mea- 
sures have significant shortcomings. For instance, two ver- 
tices may be highly similar even if they share no common 
neighbors. For a fairly detailed discussion see [49|. 

Probably, the algorithm that has influenced and mo- 
tivated a large and significant part on vertex similarity is 
Kleinberg's HITS algorithm fi4l . Interestingly, Blondel et 
al. lfl"5 16] generalized HITS and provided a general scheme 
for finding the similarity of two vertices. Jeh and Widom 
proposed the Simrank algorithm I 4T1 to compute all-pairs 
vertex similarities in a graph. Since then many improve- 
ments and extensions of the SimRank algorithm have ap- 
peared e.g., Il37l l55l . Leicht et al. propose another recur- 
sive measure of similarity closely resembling the centrality 
measure of Katz |49l . Not surprisingly again, as in HITS 
algorithm [44 1 many such algorithms both in their formula- 
tion and in the proof of convergence are closely connected to 
spectral graph theory. Spectral graph theory and specifically 
the theory of random walks provides the basis for a rich set of 
similarity measures including, e.g., commute times Il26ll28ll 
and the closely related methods including graph kernel meth- 
ods, e.g., [28, 78 1. More recursive schemes have been pro- 
posed over the last years, including l39l l83l . Needless to 
say that finding similar vertices has plenty of applications. 
Indicatively, we report link recommendation ll57l . schema 
matching [60 1 and privacy attacks [36|. 

Our geometric perspective on the problem of vertex sim- 
ilarity in Section [3] has not been considered in the literature 
to the best of our knowledge. 

2.2 Archetypal Analysis The idea of archetypal analysis 
was born by Breiman during his work on predicting the 
next-day ozone levels 04ll : each day should be expressed 
as a mixture of "extreme" or "archetypal" days. Culter 
and Breiman introduced archetypal analysis and proposed 
an alternating minimization procedure ||251 . Archetypal 
analysis has important practical applications in various fields 
including dynamical system analysis ll73l . computational 
biology 1401 . marketing [64] and many more. 

2.3 Spectral Unmixing Spectral unmixing is a central 
problem in spectral imaging. Specifically, according to 
ll42l "..spectral unmixing is the procedure by which the 
measured spectrum of a mixed pixel is decomposed into a 
collection of constituent spectra, or endmembers, and a set 
of corresponding fractions, or abundances." Keshava l42ll 
surveys existing algorithms for this problem. Of special 
interest to us is the geometric approach, inspired by Craig's 



seminal work ||231 . Several algorithms have been proposed 
for this problem, [27 56 75ll76ll and have found applications 
in several domains including computational biology lf74l . 

The computational complexity of the algorithms is a 
function of the dimensionality k of the simplex. Specifically, 
when k = 2 there exist efficient algorithms for finding 
the minimum area enclosing triangle [61 1. When k = 3 
Zhou and Suri give an algorithm with complexity 0(n 4 ) 
11841 . Packer showed that the problem is NP-hard when 
k > log (n) 11621 

2.4 Graph Matching A large amount of research work 
has been devoted to the graph matching problem by several 
research groups, see, e.g., 12). Umeyama's influential paper 
began a whole line of research f77|. Specifically, Umeyama 
proposed that instead of trying to find a permutation ma- 
trix P that minimizes \\PAaP T — Ajj\\ where Aq,Ah 
are the adjacency matrix representations of graphs G, H one 
may relax the problem to finding an orthogonal matrix P' 
that minimizes the same objective. Then one can find a 
closed form solution for the optimal P' . Specifically, many 
methods relax the constraint of searching for a permutation 
matrix P, i.e., one of the vertices of the Birkoff polytope 
|[T8l (it is a polytope whose vertices correspond to permuta- 
tion matrices, see the closely related permutahedron in Fig- 
ure[TJb)), to finding doubly stochastic matrices. For the case 
of graph similarity between graphs with different number of 
vertices typically a bipartite graph is first created which is 
widely known as the compatibility graph. The graph match- 
ing problem can be formulated as an integer quadratic prob- 
lem (IQP) which can be tackled in various ways. Dominating 
approaches include semi definite programming ll65l . spectral 
approaches JTT] [191 05] HU |69], linear programming relax- 
ations l4l l43l and the popular graduated assignment method 
|f3T1 . The latter, relaxes the IQP into a non-convex quadratic 
program and solves a sequence of convex optimization ap- 
proximation problems. Blondel et al. |[T5l[T6 l use a general- 
ization of HITS method |44| to find graph matchings. As we 
mentioned in Section [2~T| their method is also applicable to 
the vertex similarity problem. Other approaches include the 
combination of the Singular Value Decomposition and the 
Expectation-Maximization algorithm [53 , 54], Belief Propa- 
gation [ 12] and kernel-based methods l70l [80l . 

2.5 Generalized Condition Number The fundamental 
problem of linear algebra is solving the linear system of 
equations Ax — b l33l . The convergence rate of several iter- 
ative linear solvers including the popular conjugate gradient 
method |8| depends on the distribution of the eigenvalues 
of the coefficient matrix A. In the special case of A being a 
symmetric positive semidenite matrix the convergence rate is 
a function of the ratio of the largest to the smallest non-zero 
eigenvalue, also known as the condition number of matrix 



A. In the case of a preconditioned linear system the corre- 
sponding quantity that determines the rate of convergence of 
the solver, e.g., preconditioned conjugate gradient |8|, is the 
generalized condition number. The definition follows: 

Definition 1. (Generalized Condition Number J32 
Let A, B be two real matrices with the same null space K. A 
is a generalized eigenvalue of the ordered pair of matrices 
(^4, B), also called pencil, if there exists a vector x ^ K 
such that Ax = XBx. Let A(A, B) the set of generalized 
eigenvalues of the pencil (A, B). The generalized condition 
number k(A, B) is defined as the ratio of the maximum 
value A max (A, B) to the minimum value X mul (A, B). 

For every unit norm vector x the following double 
inequality holds: 

(2.1) A min (A, B)x T Bx < x T Ax < A max (A, B)x T Bx. 

For the special case of interest where the pencil 
(Lq, Lh) is a pair of Laplacian matrices of two connected 
graphs G, H on the same vertex set the condition number is 
given by the following expression: 

/ x T L G x\ ( x T L H x\ 

k(L g ,L h ) = max max . 

\x T i=o x 1 L H x J \x T i=o x 1 L G x J 

Notice that we since G,H are connected their null space 
is the same and specifically the span of the all ones vector 
1 0221 . This also explains why we require vector x to be 
orthogonal to 1 in the above expression. 

Generalized eigenvalue problems of a special form have 
several important applications in computer science. The 
seminal paper of Shi and Malik 11681 considered the gener- 
alized eigenvalue problem Lx = XDx where L is the Lapla- 
cian matrix representation of the graph and D is a diagonal 
matrix whose i-th entry is the degree of the i-th vertex to pro- 
duce image segmentations. Exactly the same problem was 
used to produce structure-informative graph embeddings by 
Belkin and Niyogi 1 1 3 ] . Studying the spectrum of the pen- 
cil (L, D) is equivalent to studying the so called normalized 
Laplacian matrix representation El . The form of this prob- 
lem appears a lot in structural mechanics where L is typically 
called the stiffness matrix and D the mass matrix. It's worth 
pointing out -to the best of our knowledge- that we were not 
able to find any other applications of generalized eigenvalue 
problems in the literature where D is not diagonal. 

3 VertexSim: Vertex Similarity via Simplex Fitting 

Before we delve into the details of the VertexSim algorithm, 
we wish to outline its central idea: along the lines of 
archetypal analysis 02411 . there exist few "extreme" types 
of vertices. Let us call these vertices pure. The vast 
majority of vertices however is a mixture of those pure 
types of vertices. We formalize the notion of "extreme" 
in a geometric sense. Specifically, we take advantage of 



the geometry of the network (for a discussion whether an 
assumption of the existence of geometric structure holds or 
not, see Section [6| by embedding it to a low-dimensional 
Euclidean space Mr. However, instead of taking the convex 
hull of the points as the set of pure points we fit a simplex, 
i.e., the convex hull of k + 1 affinely independent points, 
to the cloud of points. The vertices of the simplex are the 
pure types. The rationale is that we can express uniquely 
each other point as a convex combination of the pure points 
and hence make a quantitative analysis of vertex similarity. 
Furthermore, having for each vertex in the graph a vector of 
mixture coefficients, we can build data structures, e.g., J7) 
that answer several types of queries such as "which are the 
three vertices most similar to vertex u?" etc. Among all 
simplexes that fit the cloud of points we favor the one with 
the smallest volume, inspired by the seminal work of Craig 
11231 . We shall call the vertices of the fitted simplex social 
network archetypes since they correspond to the pure types 
of vertices. 

There are several methods to obtain an informative em- 
bedding of the data points in the Euclidean space. The ma- 
jority of them are spectral RBI . Recently, more expensive 
but structure preserving methods have been proposed and are 
based on semidefinite programming 1 67 1 . In our experiments 
we choose the k smallest non trivial eigenvectors, i.e., the 
eigenvectors corresponding the k smallest, non-zero eigen- 
values, of the normalized Laplacian [ 13 1. There exist a wide 
variety of off-the-shelf algorithms that find a minimum vol- 
ume enclosing simplex 1271 l56l l74l l75l l76l and good im- 
plementations publicly available. We use the algorithm de- 
veloped by the author and his co-authors in a previous pa- 
per ll74l which solves the following optimization problem, 
where X = {x\, . . . ,x n } C M k is the cloud of points, 
K = [vq\ . . . \vk] is a simplex in M k and 8 t e [0, l] k+1 for 
i = 1, . . . , n is the vector of mixture coefficients of point i: 

s 

(3.2) min : ^ \x t - K6i\ p + 7 log vol(K) 

i=l 

ye, ■. eji = 1, ^ >r o 

For completeness reasons we include the steps of our 
method in Algorithm 2. In the second step, we use the 
Nedler-Mead method fl7l to optimize Program |3.2| In the 
third and fourth step we compute the similarity between ev- 
ery possible pair of vertices and insert the mixture coefficient 
vectors as points to a data structure that supports the type of 
queries we are interested in, e.g., nearest neighbor queries. 

4 CondSim: Graph Similarity and the Generalized 
Condition Number 

As we have already outlined from Section [T] the graph 
matching problems poses a challenging question: What is a 
good measure of similarity? Typically, upon having a good 



Algorithm 1 VertexSim 



Require: Connected, undirected graph G([n], E). Dimen- 
sion k. Parameter 7. 

(1) Embed the graph using the k smallest non trivial 
eigenvectors of the normalized Laplacian of G. 

(2) [K, {di}i£] n ]\ <— Solve Optimization Problem 3.2 
using the Nelder-Mead method B7l . 

(3) For every pair of vertices compute Pearson's 
correlation coefficient /»y between the mixture coefficient 
vectors 0i,6j and set the similarity value Sim(i,j) equal 

tO Pij. 

(4) Add points {9i}i^[ n ] to a data structure supporting 
nearest neighbor search queries. 



objective one can come up with various algorithms to op- 
timize it. In this section we provide a novel measure of 
graph similarity between two graphs G, H. Our measure is 
the generalized condition number between the Laplacian ma- 
trix representations Lq, Lh of the two graphs. Recall from 



Section 2.5 that the generalized condition number plays an 
important role in Numerical Analysis. In Section 4.1 we 



provide our main theoretical result with our proposed algo- 
rithms, namely CondSimMC and CondSimGradDescent. In 



Section 4.2 we provide further insight on the proposed mea- 



sure of similarity between graphs. 

4.1 Theoretical Result and Algorithms Let 

G([n], Eq), H([n], Eh) be connected graphs on n vertices 
(named for simplicity {1, 2, ..,n} = [n]) and Lq,Lh 
their Laplacian matrix representation respectively. Also, 
let A(Lg,Lh) and k(Lc,Lh) be the set of generalized 
eigenvalues and the generalized condition number of the 
pencil (Lg,Lh) 11321 [33l . We use S n to denote the sym- 
metric group, i.e., the group whose elements are all the 
permutations of the set [n] and whose group operation is 
the composition of such permutations. We denote with 
where a £ S n the Laplacian matrix representation of the 
graph G whose vertex set has been renamed according to a, 
i.e., v i-> a(v) for all v £ [n]. Our main result is the next 
theorem. 

THEOREM 4. 1. Let be the state space represeting the set 
of all permutations {a : a £ S n }, and f : Q — > R + be a 
function defined by /(w) = k(L g , L H (u,)) for all u> £ O. 
Also, fix A > 1 and define tt\{uj) — z ^ where Z(\) = 

Szen ^~^ x ^ is the normalizing constant that makes n\ a 
probability measure. We define a Metropolis chain where 
we allow transitions between two states if and only if they 
differ by a transposition as follows: if /(wi) < /(0J2) 
the Metropolis Chain accepts the transition u)\ — > U2 with 
probability A-^" 1 ) - -^" 2 -* otherwise always accept it. 

As A — > +00 the stationary distribution tt\ of the 
Metropolis chain converges to the uniform distribution over 



the global minima of f. Furthermore, if G ~ H, i.e., 
G, H are isomorphic, then tt\ converges to the uniform 
distribution over the set of isomorphisms {a : Lq — 
L H (a),a £ S n }. 

Before we prove Theorem |4.1| consider two graphs 
G, H, on 4 vertices each. Figure ITTb) shows the permuta- 
hedron with 4!=24 vertices, each corresponding to one of 
the 24 possible permutations of the set [4] = {1,2,3,4} 
|fl8l . Assume that there exist three permutations 01, 02, 03 



such that L 



G 



L H (^ i ) for i = 1,2,3. The Metropolis 



chain [50 1 we define in Theorem 4.1 has the property to 
converge in the limit of A — s- +00 to the uniform distri- 
bution over the shaded set, i.e., n ^Shaded Vertex^ = 1/3, 

7r ^Non-Shaded Vertex^ = 0. When G and H are not iso- 
morphic there exists no state in ft such that h(Lq, L H {u,) )=1 
and hence for any a £ S n k(Lq, L H (^)) > 1. By standard 
properties of the Metropolis chain we know that the chain 
converges to the minima of the generalized condition num- 



ber. In Section 4.2 we explain further why this is a good 



property using the theory of support tree preconditioners. It's 
also worth pointing out that by using transpositions, we can 
reach any state/permutation of O since every permutation is 
a product of transpositions. 

Algorithm 2 CondSimMC 

Require: Lq,Lh the Laplacian matrix representation of 
G, H respectively. MAXITER (Maximum number of 
iterations). A > 1. TOLERANCE. { RAND() generates 
a number uniformly at random in [0, 1]} {a initialized to 
the identity permutation} 
a *r- (1,2, ..,n) 
% <- 

while % < MAXITER do 

i i + 1 

Cneigh *~ Tranpose two elements of a (uniformly at 
random). 

if K(L G ,L ffKo , h) )- k(L g ,L H („ } ) < Othen 

O Cr ne igh 

CN^- k(L g ,L hM ) 
else 

if RAND() < a" (Lg ' L « m) " k(Lg 'V^>) then 

O CTneigh 

CN^ k{L g ,L hW ) 
end if 
end if 

if CN < 1+TOLERANCE then 
BREAK 

end if 
end while 
return (a, CN) 

Proof. Recall that the Laplacian representation of a con- 



nected graph is a symmetric, positive semidefinite matrix and 
that the dimension of the null space is 1 (the all-ones vector 
1) ll22ll . Consider now the generalized eigenvalue problem 
Lqx — XLhx. The pencil (Lq, Lh) is Hermitian semidef- 
inite. Therefore there exists a basis of generalized eigen- 
vectors ll33l . Notice that the all-ones vector 1 is a general- 
ized eigenvector with corresponding generalized eigenvalue 
0, see also (82) pp.35-36. Let A(L G ,L H ) = {0 = A < 
Ai < ... < A n _i be the set of generalized eigenvalues. 
Then, k(Lq, Lh) = A '^ 1 ^j We prove that k(Lq, Lh) = 1 
if and only G ~ H. 

• k{L g ,L h ) = 1 =» G ~ H : 

The generalized eigenvalues are X(Lq,Lh) = (0 = 
A < 1 = Ai = . .. = A„_i). Let (1 = u ,Ux, . . .,it n -i) 
be the corresponding generalized eigenvectors which form 
a basis. Define X = Lq — Lh- Notice that Xv,i = 
for all i = 0, ..,n — 1. Hence, X = and therefore 
Lq = Lh — y G ~ H . 

• G ~ H => 3(7 £ S„ SUCH THAT k(Lq, L H («) ) = 1 : 

Since G ~ H there exists a permutation a £ S n such 
that Lq = Lh(o-) . Simply, by substituting the eigenvectors 
{u»}i=o,..,rj-i °f L G in L G x = XL h mx = XLqx we 
obtain that the generalized eigenvalues are (0, 1, 1, .., 1) and 
the corresponding eigenvectors (1 = uq, u±, . . . , u n -i). 
Hence, k(Lq,L H (^)) = 1. 

Now, define fl* = {uj £ Q : f(u) = f* = 
min^gn f(x)}- Since our chain is a Metropolis chain [50], 
its stationary distribution is ir\. Therefore, 



(4.3) 



lim 7T\ (lu) = lim , — ,., . , — 



_ i(u> g n*) 

where I(a £ A) is an indicator variable equal to 1 if element 
a belongs to set A, otherwise 0. If G, H are isomorphic 
then /* = 1 and therefore the above result suggests that the 
Metropolis Chain converges to the uniform distribution over 
the set {a : Lq = L H w) , o £ S n }. 

Algorithm 2 shows the pseudocode for CondSimMC 



which was described in Theorem 4. 1 The algorithm takes as 
input the two Laplacians Lq, Lh, the parameter MAXITER 
which is the maximum number of steps that we allow the 
Metropolis chain to perform, the parameter A > 1 and 
the parameter TOLERANCE> which quantifies an early 
stopping criterion that allows CondSimMC to terminate 
when the condition number is at most 1+TOLERANCE. 

'Notice that despite the fact that our matrices are positive semidefinite, 
and not positive definite, this doesn't cause any real problem with respect 
to defining the generalized condition number since Lq,Ljj have the same 
null space. 



Algorithm 3 shows an alternative way to minimize the 
generalized condition number which we call CondSimGrad- 
Descent. The input of CondSimGradDescent does not re- 
quire the parameter A since it is deterministic. Algorithm 3 
takes the parameter e > which quantifies the least amount 
of progress required by the algorithm to keep iterating. This 
may help in avoiding extremely incremental improvements 
which don't significantly improve the graph matching but 
cost a lot computationaly. It's worth pointing out that we set 
in our experiments e = but non-the-less its existence may 
be convenient in certain executions. CondSimGradDescent 
performs a gradient descent with respect to the generalized 
condition number using transpositions. On the one hand Al- 
gorithm 3 tends to be computationally more aggressive in the 
sense that it always moves to a state/permutation which re- 
sults in a smaller generalized condition number. On the other 
hand Algorithm 2 due to the randomization is likely to avoid 
local minima in contrast to the determininstic Algorithm 3. 
Both CondSimMC and CondSimGradDescent return the per- 
mutation which defines the best graph alignment found and 
the corresponding condition number. 

The complexities of our algorithms clearly depend on 
the choice of algorithm that solves the generalized eigen- 
value problem. Specifically, let J(Lq,Lh) be the corre- 
sponding running time as a function of the two Laplacians. 
Also let q abbreviate the maximum number of iterations 
MAXITER. Then the total running time is upper bounded 
by 0(qn 2 J(Lq, Lh)) since we perform q steps and at each 
step we compute the generalized condition number for the 
(™) possible transpositions. In our experiments we use the 
algorithm of Golub and Ye l32l . The speed of convergence 
is given in Lemma 1, p. 8 11321 . For our purposes, since 
we set the number of iterations (which are matrix-vector 
multiplications) of the Golub-Ye algorithm to a constant 
we may assume that the running time that computing the 
smallest non-trivial and the largest generalized eigenvalue of 
the pencil (Lq, Lh) is linear in the total number of edges 
\E G \ + \E H \ = 0(m) where m = m&x(\E G \,\E H \). If 
m is too large, i.e., m >> nlogn, one can use the devel- 
oped theory of spectral sparsifiers to speed up the general- 
ized condition number computations. Specifically, one may 
perform first the popular Spielman-Srivastava sparsification 
ll7Tl on both Laplacians Lq,Lh, obtain spectrally equiva- 
lent matrices Lq,Lh and apply the CondSimGradDescent 
on the latter Laplacians. 

4.2 Further Insight: Theory of Support Trees We 



proved in Theorem 4. 1 that when the generalized condition 
number is 1, then indeed G,H can be perfectly matched, 
i.e., G,H are isomorphic. To complete the justification of 
our rationale behind the choice of our measure of similar- 
ity we need to explain why does a value close to 1 imply 
a good graph alignment. The answer lies in the theory of 



Algorithm 3 CondSimGradDescent 



Require: L g ,Lh the Laplacian matrix representation of 
G, H respectively. MAXITER (Maximum number of 
iterations). TOLERANCE, e > 0. {a initialized to the 
identity permutation} 

ai- (l,2,..,n) 
i <- 

while i < MAXITER do 

i <- i + 1 

a* <- argmax^^s' k(L g ,L hW ) - k(L g ,L B („>)) 
where S' is the set of all permutations which differ from 
a. a single transposition, 
if k(L g , L H (<r) ) - k(L g , L H{a f ) ) > e then 
a <- a* 

CN<h- k{L g , L H (a)) 
else 

BREAK 
end if 

if CN < 1+TOLERANCE then 
BREAK 

end if 
end while 
return (a, CN) 

support preconditioned [1|. Preconditioning is one of the 
"holy grails" of scientific computing. Pravin Vaidya who did 
not publish any of his work brought remarkable new ideas in 
this field, giving birth to the field called today combinatorial 
scientific computing. The following facts come from Grem- 
ban's thesis |34| which continued and significantly extended 
Vaidya's ideas. These complete the justification of our ratio- 
nale. In the following, let A, B be Laplacian matrices. 

Definition 2. (Support) The support a(A, B) of matrix 
B for A is the greatest lower bound over all r such that 
tB — A is positive semidefinite, i.e., a (A, B) — lim inf{r : 
tB - A t 0}. 

Definition 3. (Congestion & Dilation) An embed- 
ding of H into G is a mapping of vertices of H onto vertices 
of G, and edges of H onto paths in G. The dilation d(G, H) 
of the embedding is the length of the longest path in G onto 
which an edge of H is mapped. The congestion g e {G, H) of 
an edge e in G is the number of paths of the embedding that 
contain e. The congestion g(G, H) of the embedding is the 
maximum congestion of the edges in G. 

The following facts has been proved in Gremban's Ph.D. 
thesis [34 1, see also ll35ll : 

Fact 1: The support number cr(A, B) is bounded above 
by the maximum product of dilation and congestion over all 
embedding of A into B. 

Fact 2: k(A, B) < a(A, B)a(B, A), Lemma 4.8 El- 
Fact 2, in combination with Fact 1, shows that the gen- 
eralized condition number is closely related to the goodness 



Simplex Dimension 



Figure 2: Performance of simplex fitting on 1000 points 
drawn uniformly at random from a randomly generated k- 
simplex perturbed by Gaussian noise N(0, a 2 ). Figure plots 
the sum of Euclidean distances of the k + 1 reconstructed 
simplex vertices from the k + 1 true vertices as a function 
of the dimensionality k of the simplex for five different 
standard deviations a = 0.01, 0.5, 1, 5, 10. Notice that the 
simplex fitting method (essentially) perfectly recovers the 
true simplex in all cases. 

of two embeddings, i.e., of H into G and vice versa. When 
both the dilation and the congestion of the embeddings -or in 
our terminology of the alignments- are small then the gen- 
eralized condition number is small. The reader interested 
in further details and proofs is urged to read Chapter 4 of 
Gremban's thesis (34). In combination with Theorem |4. 1 1 
this justifies our choice of the generalized condition num- 
ber as a measure of graph similarity. For further discussion 
on other properties of the generalized condition number read 
Section[6] 

5 Experiments 



Name (Abbr.) 


Nodes (n) 


Edges (m) 


Netscience 


1589 


2742 


Football 


115 


613 


Political Books 


105 


441 


* Erdos '72 


5488 


7085 


Erdos '82 


5822 


7375 


* Erdos '02 


6927 


8472 


* Roget Thesaurus 


1022 


3648 



Table 1: Datasets 

In this Section we provide an experimental evaluation of 
the proposed algorithmic schemes, VertexSim and CondSim. 
The Section is organized as follows: In Sections [5. l||5.2| we 
describe the datasets we used in our experiments and the ex- 



perimental setup. In Sections 5.3 5.4 we provide an experi- 
mental evaluation of our proposed methods respectively. 

5.1 Datasets Table Q] summarizes the real- world datasets 
we used for our experiments. Whenever neccessary, graphs 





2 nd Eigenvector 

Figure 3: 2-simplex fitted on a random stratified net- 
work. VertexSim correctly assigns higher similarity val- 
ues to vertices of the same age. The three vertices of 
the fitted 2-simplex conceptually represent the concepts 'se- 
nior/large age' (8-10), 'middle-aged/medium age'(4-7) and 
'young/small age' (1-3). 

are made undirected, unweighted and self loops were 
removed. Datasets annotated with © and * are available 
from |http : / / www . cise . uf 1 . edu/ research/| 
sparse/matrices/Newman/ index . html| and 



http : //www . cise. ufl.edu/ re search/ sparse/ | 
matrices/Pa jek/index . html] respectively. We pick 
small and medium sized networks on purpose since the 
geometric structure is striking. See Section [6] for further 
discussion concerning social networks and geometric 
structure. 

We also generate several synthetic datasets. Specifi- 



cally, for Section 5.3.1 we generate a cloud of points where 
each point is chosen uniformly at random from a random 
fc-simplex (see Appendix ll50ll ) and a random stratified so- 
cial network, see Section IIIA of [49]. Stratified networks 
model the phenomenon according to which individuals mak- 
ing connections with similar others with respect to some cri- 
terion, e.g., income, age. For each vertex we pick an age 
from 1 to 10, chosen uniformly at random. Two vertices 
with age i and j respectively are connected with probabil- 
ity poe~ aAt where At = \i — j\. The parameters are set 
and p 



to a 



{)., 



0.1. Finally, for Section 5.4 



generate random graphs of two types. Erdos-Renyi-Gilbert 
graphs [17 1 and R-MAT graphs |20l . For the former we 
use p = 0.5 and for the latter the parameters are set to 
[a = 0.55, b = 0.1; c = 0.1, d = 0.25]. 

5.2 Experimental Setup and Implementation Details 

The experiments were performed on a single machine, with 
Intel Xeon CPU at 2.83 GHz, 6144KB cache size and and 
50GB of main memory. Our algorithms are implemented in 
MATLAB. We used the Golub-Ye algorithm OH to compute 
condition numbers. The results we show are obtained for 
setting the parameter 7 was set to 1 and the dimensionality k 
of the embedding equal to 2. Clearly our method is valuable 



when k is larger than 3 where visualization is impossible 
(see also Section [23] for a discussion of the computational 
complexity as a function of k). Here we report results 
for k — 2 for visualization purposes. It's worth pointing 
out two more facts concerning our experimental section: 
first VertexSim for the datasets we used was indifferent 
for the value of parameter 7 since there were no outliers 
in any of the embeddings and secondly we experimented 
with higher values of k (from 3 to 5) obtaining highly 
interpretable results. The parameters of CondSimGradDesc 
were set to MAXITER=200, TOLERANCE=0, e = for all 
experiments in Section [5~4| 

A final remark with respect to the experiments of Cond- 



Sim in Section 5.4 it is a well known fact that a permutation 



can be decomposed in cycles and that a random permuta- 
tion has in expectation O(logn) cycles JSTJ. Therefore if 
in our experiments we generate only permutations chosen 
uniformly at random we are restricting ourselves with high 
probability to permutations which have a common struc- 
ture. To avoid a potential artifact in our experimental re- 
sults, we generate permutations with a ranging number of 
cycles. We use a simple recursive algorithm lIBD to gener- 
ate a permutation with k cycles uniformly at random in our 
experiments, see p. 33 (ST). Finally, we use third-party soft- 
ware and specifically the code of Q [T2) and Jeremy Kep- 
ner's Rmat code implementation. 



5.3 VertexSim at Work Sections 5.3.1 and 5.3.2 present 
the results of VertexSim on synthetic and real data respec- 
tively. 

5.3.1 Synthetic Data We validate the VertexSim algo- 
rithm in two ways: first we verify that it can successfully 
recover the simplex S and hence the mixture coefficients of 
data points sampled uniformly at random from S and sec- 
ondly we evalueate its performance on a stratified network. 
Figure [2] shows the performance of our fitting method as a 
function of the simplex dimension (x-axis) for five different 
standard deviations a = 0.01, 0.5, 1, 5, 10 (5 lines) for a ran- 
domly generated fc-simplex. The quality of the performance 
(y-axis) is quantified as the sum Xa=i I \ v i \ where Vi is 
the reconstructed vertex of the fc-simplex. The performance 
is excellent as Fig. [2] shows. The average running time for 
four executions is 0.0071 and the variance 1.4 x 10~ 6 . It's 
worth pointing out that Tsang's l75ll algorithm results also in 
the exactly same simplex. 

Figure [3] shows the performance of VertexSim for a 
stratified network with a = 0.8 and p = 0.1 and ages 
ranging from 1 to 10, picked uniformly at random for every 
vertex. Specifically Fig. |3]shows the fitted 2-simplex. Upon 
performing step 3 of Algorithm 1, it becomes apparent that 
pairs of vertices with the same age are significantly more 
similar than vertices with different ages, as a good vertex 
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(b) Political books 

Figure 4: Minimum area 2-simplexes for the (a) Football network (b) Political books network. In both cases VertexSim 
provides significant mining capabilities for extracting pairs of highly similar vertices and concepts. For more see 



Section 5.3.2 



similarity algorithm should have as its output. Furthermore 
the three vertices of the 2-simplex correspond to the three 
concepts 'senior/large age', 'middle-aged/medium age' and 
'young/small age'. 

5.3.2 Real-world Data Figure |4|a) shows the minimum 
area fitted 2-simplex for the American football college net- 
work, whose vertices correspond to teams and edges to 
games among them. According to 1 30 1 the teams are divided 
into conferences containing around 812 teams each and the 
frequency of games between members of the same confer- 
ence is higher than between members of different confer- 
ences. The three vertices of the fitted simplex correspond to 
three conferences PAC 10, SEC and MID. Furthermore, Ver- 
texSim using the fitted mixture coefficients assigns higher 
similarity to vertices of the same conference. Similar re- 
marks hold for Figure Qb) which shows the minimum area 
fitted 2-simplex for the political books network whose ver- 
tices represent books and edges copurchasing by the same 
buyer. The three vertices of the simplex correspond to lib- 
eral, convervative and 'neutral' books. The latter is in quotes 
since around that vertex of the simplex there also exist few 
scattered liberal and conservative data points/books. The 
data points whose mixture coefficient corresponding to the 
'neutral' vertex is close to 1 should probably be character- 
ized as moderate rather than neutral. For both datasets, the 
vectors of mixture coefficients provide us a way to determine 
vertex similarities in an interpretable way. Unfortunately due 
to space constraints we can only include the figures shown 
in Figure [4] In an extensive version of this work, more fig- 
ures shall be added which also highlight with heatmaps the 
block structure of the above two datasets. Concerning other 
datasets on which we have applied our method (including the 
Roget Thesaurus network, Erdos collaboration networks in 





Erdos-Renyi 


R-MAT 


# Vertices 


8 


16 


32 


8 


16 


32 


Condsim 


6/7 


5/15 


7/31 


5/7 


5/15 


11/31 


Belief Propagation! 12 1 


0/7 


0/15 


0/31 


0/7 


0/15 


0/31 



Table 2: Results of our method versus the Belief Propagation 
method of 1 12 1 on various random networks for permutations 
whose number of cycles ranges from 1 to # vertices— 1. 
The fractions indicate how many times did an algorithm 
find a permutation which makes the original graph and its 
permuted version exactly the same. 

1972, 1982 and 2002) the results of VertexSim are excellent 
overall. Indicately we report few highly similar pairs of ver- 
tices according to VertexSim: (musician, poetry), (melody, 
poetry), (voice, hearing) from the Roget Thesaurus network 
and networks that (Vojtech Rodl, Noga Alon), (Joel Spencer, 
Janos Pach) from the Erdos collaboration network in 1972. 



5.4 CondSim at Work Sections |5.4.1| and |5.4.2| present 
the results of CondSim on synthetic and real data respec- 
tively. In the following we use the gradient descent version 
of our algorithm. 

5.4.1 Synthetic Data We compare CondSim with the Be- 
lief Propagation based method of Tl2\ for few synthetic 
datasets in the following way. We generate a graph (Erdos- 
Renyi-Gilbert and RMAT) of n vertices and a permutation 
with k cycles, where k ranges from 1 to n — 1. Notice that 
we don't consider the identity permutation with n cycles. 
We permute the graph according to the random permutation 
and see whether the graph matching methods can align per- 
fectly the original graph and its permuted version. We use 
two types of random graphs, namely Erdos-Renyi-Gilbert 
and Rmat graphs with 8, 16, 32 vertices. Table|2]shows the 




Figure 5: A random execution of CondSimGradDesc on the 
Football network with a random permutation. The method 
in the first few steps makes large improvements in the 
generalized condition number but then it gets stuck to a local 
minimum of value 4.16. 

results. As we see, CondSim outperforms significantly lfT2l . 
It's worth pointing out again that this is a validation test and 
that fast graph isomorphism tests exist, e.g., for Erdos-Renyi 
graphs see Ch.3 ifTTl . An interesting trend observed from the 
Table and verified in other experiments as well, is that the 
fewer the cycles of the permutation, the easier CondSim gets 
"stuck" to local minima. On the positive side when the num- 
ber of cycles is small CondSim finds an optimal alignment 
fairly easily. 

5.4.2 CondSim as a Post-Processing Tool Due to the 

computational cost of CondSim, a realistic use of it in ap- 
plications would be as a post processing tool. We show an 
example of how CondSim can be used as a postprocessing 
tool to improve the graph alignment in combination with the 
Belief-Propagation based method of lfT2l . We perform the 
following simple experiment: we consider the graph G of 
the Football network, see Table [T] We generate a permuta- 
tion uniformly at random and permute the labels of G ac- 
cordingly. We apply the function netalignbpO allowing all 
possible edges in the bipartite graph of [ 12 1. The number of 
fixed points of the permutation we obtained is 0. The align- 
ment produced by lfl2l has recognized correctly 50 out of the 
1 15 correct vertex to vertex assignments. The generalized 
condition number equals 29.4. Applying the CondSimGrad- 
Desc method to the alignment obtained from Belief Propa- 
gation, we obtain a generalized condition number of value 
4.16 resulting in 74 correct assignments. Figure[5]shows the 
progress of CondSim during this process and illustrates the 
fact that it gets stuck to a local minimum. Understanding 
this behavior is an interesting research problem, including 
several others which we discuss in the following Section. 

6 Discussion & Research Directions 

The work we presented in Sections[3]and|4]opens several new 
research directions, certain of which we discuss in this sec- 



tion. We picked the simplex as the geometric object to be fit- 
ted. Given the simplex, we "reduce']^] the problem of finding 
the similarity of two vertices to comparing their correspond- 
ing, unique, convex combination coefficients. This approach 
has two important advantages, namely supporting interesting 
types of queries such as 'which other vertices are most sim- 
ilar to this vertex?" and extracting useful knowledge for our 
network by examining the archetypes. It appears from our 
experimental results that simplexes capture well the geome- 
try of small and medium sized social networks. Two natural 
questions come up: Is there always geometric structure is 
social networks? Can we fit more other geometric objects 
such as simplicial complexes to capture more complex geo- 
metric structure? Leskovec et al. [51 1 have studied exten- 
sively properties of large scale networks and it appears that 
there exists strong geometric structure in small and medium 
sized networks like the ones we studied in Section|5]but this 
structure typically decays as the size of the network grows. 
The answer to the second question is an interesting research 
problem, see also 12 II . Concerning our optimization prob- 
lem, several other optimization techniques can be applied 
due to its special form, see for instance [66]. An experimen- 
tal study of these methods for our problem is a future task. 
Finding statistically meaningful ways to pick parameters k, 7 
is also a task to be completed. In our experiments, empiri- 
cally small values for k, 7 gave excellent results. Another 
aspect concerns the graph embedding. We used a normalized 
Laplacian graph embedding lfl3l in our experiments. There 
exist new techniques which are computationally more expen- 
sive than spectral-based dimensionality reduction methods 
[48 1 but guarantee a stronger preservation of geometric, see 
for instance 11671 . Finally, an interesting question concerns 
the relation of our method to community detection. For in- 
stance, what is the result of clustering the vectors of mixture 
coefficients and how does it compare to existing clustering 
methods, e.g., |30|? 

Concerning our second algorithmic scheme, an interest- 
ing problem is to extend it to cases where the two graphs 
have a different number of vertices, i.e., | Vq\ < \Vjj\- Even 
if several heuristics can be thought of such as adding sev- 
eral one-degree vertices to the graph G until they become 
of the same order | Vh | and using perturbation theory ll72l to 
quantify the effect, the approach of Steiner tree precondition- 
ers is more promising, see l46l . An interesting view on our 
method is the following: it is well known that we can view a 
graph as an electrical network |26 1 and that for a given graph 



G, x LqX equals the total energy consumed by the network 
where x is the vector of voltages on the endpoints/vertices. 



Taking into account Inequality 2.1 and the above fact, we are 
trying to align the two electrical networks in such way that 



2 Notice that it is not a real reduction since there is no uniquely accepted 
definition of similarity of two vertices. 



for any experiment we do, i.e., setting the voltage vector x, 
the total energies consumed are similar. Setting the voltages 
in such way to produce cuts (14 1 it also becomes evident that 
the CondSim algorithmic scheme tries to make the cut struc- 
tures of the aligned graphs as similar as possible. We want 
to outline that our work does not attempt to make any claims 
neither for the theoretical graph isomorphism problem 
(which for many classes of graphs is solvable in polyno- 
mial time, e.g., [10|, and remains open for specific classes 
of graphs which are highly symmetric, e.g., strongly regular 
graphs) nor for the practical ll59l . The complexity of graph 
isomorphism remains open despite sophisticated attempts, 
see l29l and (5J. However, a detailed theoretical analysis 
of our method is an interesting research problem as is the 
less ambitious goal of understanding its performance in sim- 
ple cases such as paths or trees. Our method provides a new 
similarity objective which has good reasons to exist. Finally, 
our method can be used in combination with other methods, 
e.g., Belief Propagation lfT2ll . as a post-processing tool in or- 
der to improve the alignment. In Section [5] we show such a 
use and verify in practice the value of our method. Using it 
in other applications such as shape matching is a direction to 
be pursued in the near future. 

7 Conclusion 

In this work we propose two novel algorithmic schemes Ver- 
texSim and CondSim for the vertex similarity and the graph 
matching problem respectively. Our main contributions are 
two-fold. First, we introduce a geometric perspective of an 
archetypal analysis on social networks which allows to an- 
swer efficiently queries of the type "which other vertices are 
most similar to this vertex?" using appropriate data struc- 
tures such as (approximate) nearest neighbor data structures 
0. Secondly, we propose the generalized condition num- 
ber of two Laplacians as a measure of similarity of graphs 
based on the theory of support tree preconditi oners [34|. We 
justified in Section [4] in detail why this is a good measure 
of similarity. We introduce three new algorithms which give 
interesting and promising results. Our work opens numerous 
new research directions. We discuss several such directions 
in detail in Section[6] 
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